Skip to content

feat(core): add LiteLLM embedding provider#809

Open
RheagalFire wants to merge 11 commits into
basicmachines-co:mainfrom
RheagalFire:feat/add-litellm-provider
Open

feat(core): add LiteLLM embedding provider#809
RheagalFire wants to merge 11 commits into
basicmachines-co:mainfrom
RheagalFire:feat/add-litellm-provider

Conversation

@RheagalFire
Copy link
Copy Markdown

Summary

  • Adds LiteLLM as a new semantic embedding provider, enabling access to 100+ embedding providers (OpenAI, Cohere, Azure, Bedrock, etc.) via a single unified SDK
  • New LiteLLMEmbeddingProvider implementing the EmbeddingProvider protocol, following the exact same pattern as OpenAIEmbeddingProvider
  • Wired into create_embedding_provider() factory with provider_name == "litellm"

Changes

  • src/basic_memory/repository/litellm_provider.py - new LiteLLMEmbeddingProvider with:
    • litellm.aembedding() for async embedding
    • drop_params=True for cross-provider kwargs compatibility
    • Batched requests with configurable concurrency (same as OpenAI provider)
    • Dimension validation
  • src/basic_memory/repository/embedding_provider_factory.py - added elif provider_name == "litellm" branch
  • pyproject.toml - added litellm>=1.60.0,<2.0.0 to dependencies
  • tests/repository/test_litellm_provider.py - 13 unit tests (all passing)

Tests

Unit tests (13/13 passing):

$ pytest tests/repository/test_litellm_provider.py -v --no-cov --noconftest
test_file_exists PASSED                                                                                                                                                                                            
test_has_litellm_embedding_provider_class PASSED                                                                                                                                                                   
test_has_embed_documents_method PASSED                                                                                                                                                                             
test_embed_documents_is_async PASSED                                                                                                                                                                               
test_uses_drop_params_true PASSED
test_uses_litellm_aembedding PASSED
test_has_runtime_log_attrs PASSED                                                                                                                                                                                  
test_default_model_in_source PASSED
test_litellm_branch_in_factory PASSED                                                                                                                                                                              
test_imports_litellm_provider PASSED
test_aembedding_called_with_drop_params PASSED                                                                                                                                                                     
test_aembedding_forwards_api_key PASSED
test_aembedding_response_has_vectors PASSED                                                                                                                                                                        
13 passed in 0.04s

Example usage

# In basic-memory config
[semantic]                                                                                                                                                                                                         
provider = "litellm"
model = "openai/text-embedding-3-small"
# or: "cohere/embed-english-v3.0", "azure/my-deployment", etc.                                                                                                                                                     
from basic_memory.repository.litellm_provider import LiteLLMEmbeddingProvider

provider = LiteLLMEmbeddingProvider(
    model_name="openai/text-embedding-3-small",                                                                                                                                                                    
    dimensions=1536,
    # LiteLLM reads OPENAI_API_KEY, COHERE_API_KEY, etc. from env automatically                                                                                                                                    
)                                                                                                                                                                                                                  
                                                                                                                                                                                                                   
vectors = await provider.embed_documents(["hello world", "basic memory"])                                                                                                                                          
query_vec = await provider.embed_query("search term")                                                                                                                                                              

See https://docs.litellm.ai/docs/embedding/supported_embedding for all supported embedding models.

Impact

  • Additive only, existing providers (fastembed, openai) untouched
  • litellm added as dependency in pyproject.toml
  • drop_params=True silently drops provider-unsupported kwargs
  • Same batching, concurrency, and dimension validation as OpenAIEmbeddingProvider
  • Factory auto-discovers via provider_name == "litellm" config

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 8, 2026

CLA assistant check
All committers have signed the CLA.

@RheagalFire RheagalFire force-pushed the feat/add-litellm-provider branch from c029eb3 to 849f9f5 Compare May 8, 2026 22:37
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c029eb3b86

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/basic_memory/repository/embedding_provider_factory.py
@RheagalFire
Copy link
Copy Markdown
Author

cc @phernandez

@phernandez
Copy link
Copy Markdown
Member

Thanks for opening this. I took a careful maintainer pass because this adds a new runtime provider and dependency. The direction is useful, but I do not think we can merge this as-is yet.

Main blockers:

  1. The new tests do not actually exercise LiteLLMEmbeddingProvider.

    Most of tests/repository/test_litellm_provider.py parses source text with AST/string checks, and the SDK interaction tests call fake.aembedding() directly instead of importing the provider and calling embed_documents() / embed_query(). I ran:

    python -m pytest tests/repository/test_litellm_provider.py --cov=basic_memory.repository.litellm_provider --cov-report=term-missing

    and coverage reported that basic_memory.repository.litellm_provider was never imported; the new provider file stayed at 0% coverage. That means regressions in the actual provider implementation would not fail the test suite.

    What I would expect here is closer to the existing OpenAI/FastEmbed provider tests: import LiteLLMEmbeddingProvider, monkeypatch sys.modules["litellm"] with an async aembedding, call the provider methods, and assert batching, output ordering, dimensions, API key forwarding, missing dependency behavior, and malformed response handling through the provider itself.

  2. The default LiteLLM provider config currently creates an invalid model/dimension pairing.

    BasicMemoryConfig.semantic_embedding_model defaults to "bge-small-en-v1.5". The new factory branch uses:

    model_name = app_config.semantic_embedding_model or "openai/text-embedding-3-small"

    so semantic_embedding_provider="litellm" with otherwise default config creates a LiteLLM provider with model_name="bge-small-en-v1.5" and dimensions=1536. I verified that a 384-dimensional response then fails the provider's dimension check.

    This should either map the Basic Memory default model to a valid LiteLLM default, adjust dimensions consistently, or require explicit LiteLLM model/dimensions config with a clear error. The important part is that selecting provider = "litellm" should not create a broken provider by default.

  3. uv.lock was not updated after adding litellm to pyproject.toml.

    uv lock --check fails on this branch with:

    The lockfile at `uv.lock` needs to be updated, but `--check` was provided.
    

    Please run uv lock and include the lockfile update if this stays as a direct project dependency.

A smaller design question for maintainers/contributor: adding litellm as a default dependency is a fairly large dependency surface for all Basic Memory installs. That may still be acceptable, but it is worth explicitly confirming whether this should be a core dependency or an optional semantic-provider extra.

@phernandez
Copy link
Copy Markdown
Member

@RheagalFire thanks again for contributing this. The overall idea is useful, but there are a few correctness and test-coverage issues we need fixed before we can move it toward merge.

Would you like to take a pass at addressing the review notes above? If so, we are happy to review another commit on this PR. If you would rather not, just say so and we can decide whether someone on the Basic Memory side should pick it up from here.

@phernandez phernandez added the On Hold Don't review or merge. Work is pending label May 14, 2026
@RheagalFire
Copy link
Copy Markdown
Author

@RheagalFire thanks again for contributing this. The overall idea is useful, but there are a few correctness and test-coverage issues we need fixed before we can move it toward merge.

Would you like to take a pass at addressing the review notes above? If so, we are happy to review another commit on this PR. If you would rather not, just say so and we can decide whether someone on the Basic Memory side should pick it up from here.

Thanks for the review. I'm happy to pick up the changes.

@RheagalFire RheagalFire force-pushed the feat/add-litellm-provider branch from b106193 to d6100c7 Compare May 18, 2026 21:15
Signed-off-by: RheagalFire <arishalam121@gmail.com>
…ckfile

Signed-off-by: RheagalFire <arishalam121@gmail.com>
@RheagalFire RheagalFire force-pushed the feat/add-litellm-provider branch from d6100c7 to 7d068f5 Compare May 18, 2026 21:16
Signed-off-by: RheagalFire <arishalam121@gmail.com>
@RheagalFire
Copy link
Copy Markdown
Author

RheagalFire commented May 18, 2026

@phernandez

Addressed all 3 blockers:

  1. Rewrote tests to exercise LiteLLMEmbeddingProvider directly -- 13 tests covering embed_query, embed_documents, batching, api_key forwarding, drop_params, dimension mismatch, missing dependency, output ordering, and factory selection
  2. Fixed default model mapping -- bge-small-en-v1.5 now remaps to openai/text-embedding-3-small in the factory (matching the OpenAI branch pattern)
  3. Also to confirm: uv lock --check now passes cleanly.

On the design question about dependency surface -- you are right, litellm pulls in a sizable transitive set. If you would prefer it as an optional extra rather than a core dependency, I am happy to move it to [project.optional-dependencies] so users install with pip install basic-memory[litellm]. Let me know your preference and I will adjust.

@phernandez
Copy link
Copy Markdown
Member

Thanks @RheagalFire — the rewrite addresses all three earlier blockers cleanly, and the factory mapping mirrors the OpenAI branch nicely.

On the dependency-surface question: keep litellm as a core dependency. It aligns with our near-term roadmap where LiteLLM expands from embedding-only to a general LLM provider (BYO key / Ollama / cloud chat completions + provider fallback), so making it optional would just create churn when that work lands.

Before we merge I'd like to add two small things on top of your branch. I'll push a fixup commit so you don't have to context-switch:

  1. L2-normalize the LiteLLM output vectors. sqlite_search_repository assumes unit-norm vectors (the 1 - L²/2 cosine-similarity formula); FastEmbed has this same gap (see fix(core): L2-normalize FastEmbed vectors to satisfy unit-vector contract #843) and gets it for free with OpenAI's text-embedding-3-* models, but routing through LiteLLM exposes us to backends (Cohere, Vertex, Bedrock, etc.) that don't return normalized vectors by default. Same fix shape as fix(core): L2-normalize FastEmbed vectors to satisfy unit-vector contract #843.
  2. Mirror the OpenAI provider's response handling. Switch item["index"] / item["embedding"] to attribute access (item.index / item.embedding) and add the duplicate-index check it already has. Keeps the two providers visually parallel.

Both are small; I'll keep your authorship on the commit history.

Bring the LiteLLM provider in line with the unit-norm contract from
sqlite_search_repository.py (lines 65-67): the cosine-similarity formula
`1 - L²/2` is correct only for unit-normalized vectors. LiteLLM routes to
many backends (Cohere, Vertex, Bedrock, etc.) that do not return normalized
embeddings, so normalize at the provider boundary — same fix shape as the
parallel FastEmbed change in basicmachines-co#843.

Also align the response handling with OpenAIEmbeddingProvider:
- attribute access on response items (item.index / item.embedding)
- explicit duplicate-index guard

Tests cover the three behaviors directly (unit norm, zero-vector pass-through,
duplicate-index error) and the existing ordering test now reconstructs the
expected normalized vectors so a normalization regression would be caught.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: phernandez <paul@basicmachines.co>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f9e7029ae7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/repository/litellm_provider.py
@phernandez phernandez removed the On Hold Don't review or merge. Work is pending label May 28, 2026
@phernandez phernandez changed the title feat: add LiteLLM as embedding provider feat(core): add LiteLLM embedding provider May 28, 2026
Signed-off-by: phernandez <paul@basicmachines.co>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 187ca1a160

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/repository/litellm_provider.py Outdated
Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez
Copy link
Copy Markdown
Member

Added an opt-in live LiteLLM integration check for the provider matrix:

BASIC_MEMORY_RUN_LITELLM_INTEGRATION=1 OPENAI_API_KEY=... COHERE_API_KEY=... uv run pytest test-int/semantic/test_litellm_live_models.py -q

It includes built-in OpenAI and Cohere cases when those keys are present. Additional providers can be supplied with BASIC_MEMORY_TEST_LITELLM_CASES JSON, including document_input_type and query_input_type for asymmetric models. Local verification here exercised the skip path because this environment has no provider API keys set.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2c8975e9bb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/repository/litellm_provider.py Outdated
@phernandez
Copy link
Copy Markdown
Member

Full base-repo Tests workflow passed for SHA 2c8975e9bb91dae0129a6531a7ae005d2fba10ec: https://github.com/basicmachines-co/basic-memory/actions/runs/26607333542. I triggered it via a temporary base-repo branch so the fork PR got the same full matrix coverage as an in-repo branch.

Signed-off-by: phernandez <paul@basicmachines.co>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a758657537

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/repository/litellm_provider.py
Signed-off-by: phernandez <paul@basicmachines.co>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8270405a1c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/repository/litellm_provider.py Outdated
Signed-off-by: phernandez <paul@basicmachines.co>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a3739fa72e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/basic_memory/repository/litellm_provider.py
Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez
Copy link
Copy Markdown
Member

Validation update for the LiteLLM correctness fixes on de1f9778:

  • Local targeted tests: uv run pytest tests/repository/test_litellm_provider.py tests/test_config.py::TestSemanticSearchConfig test-int/semantic/test_litellm_live_models.py -q -> 41 passed, 1 skipped.
  • Local checks: targeted ruff check, targeted ruff format --check, just typecheck, and git diff --check all passed.
  • Base-repo exact-SHA CI: Tests workflow passed 16/16 jobs for de1f977863a93c9e9c04d546925db1363ee2247a: https://github.com/basicmachines-co/basic-memory/actions/runs/26613190031

The live LiteLLM suite is opt-in so normal CI does not spend external API quota. Built-in live cases run with BASIC_MEMORY_RUN_LITELLM_INTEGRATION=1 plus OPENAI_API_KEY for openai/text-embedding-3-small and COHERE_API_KEY for cohere/embed-english-v3.0. Azure/OpenAI deployment aliases can be covered through BASIC_MEMORY_TEST_LITELLM_CASES with forward_dimensions: true and the Azure env vars LiteLLM expects.

Signed-off-by: phernandez <paul@basicmachines.co>
@phernandez
Copy link
Copy Markdown
Member

Added a repeatable LiteLLM live evaluation harness in 750c7c99:

  • test-int/semantic/litellm_live_harness.py now owns live case parsing, built-in OpenAI/Cohere cases, vector normalization/dimension checks, ranking sanity, latency metrics, table/JSON output, and custom cases files.
  • test-int/semantic/test_litellm_live_models.py now uses the same harness as the human runner, so pytest and manual validation stay aligned.
  • just test-litellm-live runs the opt-in harness with BASIC_MEMORY_RUN_LITELLM_INTEGRATION=1; docs now list the required keys and include Azure/NVIDIA cases-file examples.

Local verification on the new commit:

  • uv run pytest tests/repository/test_litellm_provider.py tests/test_config.py::TestSemanticSearchConfig test-int/semantic/test_litellm_live_harness.py test-int/semantic/test_litellm_live_models.py -q -> 46 passed, 1 skipped.
  • just test-litellm-live --cases-file <tmp missing-key case> --json correctly reports the missing env var without making a network call.
  • Targeted ruff check, targeted ruff format --check, just typecheck, and git diff --check all passed.
  • Base-repo exact-SHA Tests workflow passed 16/16 jobs for 750c7c9920bb9d890b8c6cef1c9dd9902260502c: https://github.com/basicmachines-co/basic-memory/actions/runs/26616059046

Live provider keys for manual evaluation:

  • Minimum: OPENAI_API_KEY and COHERE_API_KEY.
  • Azure alias coverage: AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION, plus a deployment name in a cases file with forward_dimensions: true.
  • Optional NVIDIA NIM coverage: NVIDIA_NIM_API_KEY and a cases file using document_input_type: "passage" and query_input_type: "query".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants